github.com/bio-raum - bioinformatic resources for surveillance in food safety and public health.

GABI Genomic Analysis of Bacterial Isolates

Run Infos

Usermhoeppner
Date2025-02-07 08:17:45.726670
Pipeline version0.9.6
Command line callnextflow run /home/mhoeppner/git/bio-raum/gabi/main.nf -profile lsh --input samples.csv --run_name Test -resume
Work directory/work_syn/ngs/pipelines/gabi/test/work

Summary

Sample
StatusThe overall analysis status: pass: ok to use, warn: potential issues found, fail: most probably not usable
Best-guess taxonThe highest scoring taxon using kmer matching (S/MASH)
Reference genomeThe highest matching hit in RefSeq to this assembly
AssemblyInformation about this assembly
Mean coverageMean coverage of reads mapped back to the assembly - bigger is better
Coverage 40X (%)Percentage of assembly covered at 40X or more
Read qualityQuality metrics of reads after trimming
ContaminationIndicators of contamination
Assembly % Cov Size (Mb)
#ContigsThe number of chromosomal contigs, i.e. without plasmids.
N50 (Kb)The size of contigs (>=)in which 50% of the assembly are represented.
Gene space (%)The fraction of broadly conserved genes fully covered in this assembly (BUSCO).
GC (%)GC content of the assembly. Deviations from the species default are highlighted in orange (mild) and red (strong, something likely wrong)
Total ILM ONT HiFi
TotalAcross all sequencing technologies
ILMIllumina reads 40X
ONTONT reads 40X
HiFiPacbio reads 40X
ILM Q30 (%)Fraction of Illumina reads above Q30.
ONT Q15 (#)Number of ONT reads above Q15.
ONT N50 (bp)N50 of ONT reads
Confindr (%)Based on sequence variation; not 100% reliable for ONT data.
Taxa >10%
SAMEA2707760_Ecoli pass Escherichia coli GCF_002176515.1 98.2 5.27 136 160.67 100.0 50.34 87.68 87.68 - - 100 100 - - 90.0 - - - 1

top

Assembly metrics

Descriptive metrics of individual assemblies determined by Quast.

Sample Assembly size (Mb) Fraction of reference Ns per 100kb Largest contig (Kb) Misassembled contigs Contigs > 1kb Contigs > 5kb Size (Mb) in contigs > 1kb Size (Mb) in contigs > 5kb
SAMEA2707760_Ecoli 5.27 98.2 0.00 375.97 78 106 60 5.24 5.13

top

Insert size distribution (Illumina)

Insert size refers to the size of the sequenced DNA fragment. Depending on the exact library protocol, this size will fall fairly uniformly around a mean value (~300-500bp). For Illumina data, that value should typically be (slightly) larger than the combined length of forward and reverse read for optimal data yield. Very flat curves may (depending on the protocol!) indicate a failure during fragment size selection/enrichment. Neither small insert sizes nor flat curves are a clear predictor for subsequent assembly issues, but can inform any potential debugging efforts.

top

BUSCO scores

BUSCO scores describe the coverage of the assemblied gene space against a set of broadly conserved singleton genes (here: bacteria_odb10). A perfect assembly should have a complete coverage of the gene space (complete: 100%), without any fragmentation or, worse, duplication. A high value of duplication may indicate assembly errors or contamination. Some taxa with very streamlined gene content, such as Campylobacter, will typically have a completeness score of less than 100%. The Completeness estimates may include duplicated genes, so values greater than 100% are possible (i.e. all genes present, of which x % are duplicated).

top

Bracken - taxonomic composition (ILLUMINA)

Bracken processes raw outputs from Kraken2, which matches kmers from raw sequencing reads against a reference database to determine the taxonomic composition of a read set. For DNA from pure cultures (which is the focus of GABI), only one species should be identified at dominant proportions. For some taxa, like Campylobacter, several species from the same genus may be found at comparative abundances due to a lack of sufficient DNA differences. Otherwise, identification of multiple taxa at higher proportions may indicate a contamination issue.
top

MLST

Taxa-specific MLST schemas classify assemblies into pre-defined types or groups. Results are divided by typing schema (and consequently taxa).

Scheme: ecoli (Escherichia coli)
Sample MLST type
SAMEA2707760_Ecoli 296

Scheme: ecoli_achtman_4 (Escherichia coli)
Sample MLST type
SAMEA2707760_Ecoli 11

top

Serotyping

Serotyes, similar to MLST types, classify assemblies based on a set of predefined gene profiles.

ectyper (Escherichia coli)
Sample Serotype
SAMEA2707760_Ecoli O157:H7

Stecfinder (Escherichia coli)
Sample Serotype
SAMEA2707760_Ecoli O157:H7

top

Software versions

ABRICATE_RUN
abricate: 1.0.1
ABRICATE_RUN_ECOLI_VIRULENCE
abricate: 1.0.1
BCFTOOLS_STATS
bcftools: '1.20'
BRACKEN_BRACKEN
bracken: '2.9'
BUSCO_BUSCO
busco: 5.3.0
CONFINDR
confindr: 0.7.4
CUSTOM_DUMPSOFTWAREVERSIONS
python: 3.11.7
yaml: 5.4.1
DNAAPLER
dnaapler: 1.1.0
DOWNLOAD_GENOME
datasets: 16.22.1
ECTYPER
ectyper: 2.0.0
FASTP
fastp: 0.23.4
FASTQC
fastqc: 0.12.1
HAMRONIZATION_ABRICATE
hamronization: 1.1.4
HAMRONIZATION_AMRFINDERPLUS
hamronization: 1.1.4
HAMRONIZATION_SUMMARIZE
hamronization: 1.1.4
KRAKEN2_KRAKEN2
kraken2: 2.1.2
pigz: '2.6'
MLST
mlst: 2.23.0
MOBSUITE_RECON
mobsuite: 3.1.8
PROKKA
prokka: 1.14.6
QUAST
quast: 5.2.0
SHOVILL
shovill: 1.1.0
SNIPPY_RUN
snippy: 4.6.0
SOURMASH_SEARCH
sourmash: 4.8.4
SOURMASH_SKETCH
sourmash: 4.8.4
STECFINDER
stecfinder: 1.1.0
TABIX_TABIX
tabix: '1.20'
Workflow
Nextflow: 24.04.4
bio-raum/gabi: 0.9.6

top
Report generated by bio-raum/gabi. Please check out our documentation.